Adebayo Abdulganiyu KEJI

Image gotten from ml-ops.org

As the name implies “pipe-line”, which means a line of pipe of water. We all know that a pipe has a single input and output phase so does a pipeline in machine learning. The machine learning operations load data using the data ingestion component, perform Exploratory Data Analysis (EDA) and Feature Engineering (FE) and transform data using the Data Transformation (DT) components of ML operations

All the above operations mentioned deal with data preprocessing. After the completion of this process then the next workflow is the model training component. This component deals with the training of one or more machine learning models, performing hyper-parameter tunning to obtain an optimal model performance, experiment tracking, and model registry. The aim of building machine learning models is to help solve some important challenges around a business domain and consequently make some insightful business decisions to drive sales. However, while at this we would strive to ensure that the model developed fits into the business space through model evaluation. Hence, the machine learning operations include the model evaluation components. In this process, state-of-the-art evaluation metrics are used which include, accuracy, f1 score, precision, and recall. Finally, we have the deployment of our model for real-life application.

All of these components are what make up a machine-learning pipeline. Thus I can define a machine learning pipeline as a sequential automated step that enables the reproduction of well well-crafted machine learning model free from error with almost no human intervention from data gathering/collection to deployments. There are various advantages of operationalizing machine learning which birth the pipeline/workflow; this includes, its help to save production and development time, reproducibility, error-free, cost-effective, adequate model versioning, and many more.

In the real world, we can’t talk about pipelines in machine learning without mentioning Airflow, mage AI, prefect, Luigi, Dagstar, etc. All said and done I will talk about the widely used tool; Airflow in my future post as it uses the Directed Acyclic Graph (DAG) for its operations to demonstrate dependencies and relationships among components.

Adeus...